Take Home Exercise 1

Author

Hulwana

1 Overview

1.1 Getting Started

In the code chunk below, p_load() of pacman package is used to install and load the following R packages into R environment:

  • sf is use for importing and handling geospatial data in R,

  • tidyverse is mainly use for wrangling attribute data in R,

  • tmap will be used to prepare cartographic quality chropleth map,

  • spdep will be used to compute spatial weights, global and local spatial autocorrelation statistics, and

  • funModeling will be used for rapid Exploratory Data Analysis

pacman::p_load(sf, tidyverse, tmap, spdep, readr, dplyr, tidyr,funModeling)
package 'tidyverse' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Hulwa\AppData\Local\Temp\Rtmpam05MD\downloaded_packages
package 'funModeling' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\Hulwa\AppData\Local\Temp\Rtmpam05MD\downloaded_packages

1.2 Importing Geospatial Data

In this in-class data, two geospatial datasets will beused, they are:

  • geo_export

  • nga_ADM2

1.2.1 Importing Geospatial Data

First, we are going to import the water point geospatial data (i.e. geo_export) by using the code chunk below.

# wp <- st_read(dsn = "data",
#                    layer = "geo_export",
#                    crs = 4326) %>%
#   filter(clean_coun == "Nigeria")

Things to learn from the code chunk above:

  • st_read() of sf package is used to import geo_export shapefile into R environment and save the imported geospatial data into simple feature data table.

  • filter() of dplyr package is used to extract water point records of Nigeria.

Next, write_rds() of readr package is used to save the extracted sf data table (i.e. wp) into an output file in rds data format. The output file is called wp_nga.rds and it is saved in geodata sub-folder.

write_rds(wp, "data/wp_nga.rds")

1.2.2 Import Nigeria LGA Boundary data

Now, we are going to import the LGA boundary data into R environment by using the code chunk below.

nga <- st_read(dsn = "data",
               layer = "geoBoundaries-NGA-ADM2",
               crs = 4326)

Thing to learn from the code chunk above.

  • st_read() of sf package is used to import nga_admbnda_adm2_osgof_20190417 shapefile into R environment and save the imported geospatial data into simple feature data table.

1.3 Data Wrangling

1.3.1 Recoding NA values into string

In the code chunk below, replace_na() is used to recode all the NA values in status_cle field into Unknown.

wp_nga <- read_rds("data/wp_nga.rds") %>%
  dplyr::mutate(status_cle = 
           replace_na(status_cle, "Unknown"))

1.3.2 EDA

In the code chunk below, freq() of funModeling package is used to display the distribution of status_cle field in wp_nga.

freq(data=wp_nga, 
     input = 'status_cle')

The above bar chart provide a brief understanding that the percentage of water-points that are functional in Nigeria is slightly less than 50%. It is crucial thus to dive deeper to determine if there are significant pattern in areas that do not have functional water-points and if the neighbouring areas can support those areas that face scarcity in water supply.

Observe that there are two categories with similar names (i.e. ‘Non-functional due to dry season’ and ‘Non functional due to dry season’, we will standardise this by shanging that later to ‘Non-functional due to dry season’.

1.4 Extracting Water Point Data

In this section, we will extract the water point records by using classes in status_cle field.

1.4.1 Extracting functional water point

In the code chunk below, filter() of dplyr is used to select functional water points.

wpt_functional <- wp_nga %>%
  filter(status_cle %in%
           c("Functional", 
             "Functional but not in use",
             "Functional but needs repair"))
freq(data = wpt_functional,
     input = "status_cle")

1.4.2 Extracting non-functional water point

In the code chunk below, filter() of dplyr is used to select non-functional water points.

wpt_nonfunctional <- wp_nga %>%
  filter(status_cle %in%
           c("Abandoned/Decommissioned", 
             "Abandoned",
             "Non-Functional",
             "Non functional due to dry season",
             "Non-Functional due to dry season"))
freq(data=wpt_nonfunctional, 
     input = 'status_cle')

1.4.3 Extracting water point with Unknown class

In the code chunk below, filter() of dplyr is used to select water points with unknown status.

wpt_unknown <- wp_nga %>%
  filter(status_cle == "Unknown")

1.5 Performing Point-in-Polygon Count

nga_wp <- nga %>% 
  mutate(`total wpt` = lengths(
    st_intersects(nga, wp_nga))) %>%
  mutate(`wpt functional` = lengths(
    st_intersects(nga, wpt_functional))) %>%
  mutate(`wpt non-functional` = lengths(
    st_intersects(nga, wpt_nonfunctional))) %>%
  mutate(`wpt unknown` = lengths(
    st_intersects(nga, wpt_unknown)))

1.6 Saving the Analytical Data Table

nga_wp <- nga_wp %>%
  mutate(pct_functional = `wpt functional`/`total wpt`) %>%
  mutate(`pct_non-functional` = `wpt non-functional`/`total wpt`) %>%
  dplyr::select(1, 6:10)

Things to learn from the code chunk above:

  • mutate() of dplyr package is used to derive two fields namely pct_functional and pct_non-functional.

  • to keep the file size small, select() of dplyr is used to retain only field 1, 6,7,8,9 and 10.

Now, you have the tidy sf data table subsequent analysis. We will save the sf data table into rds format.

write_rds(nga_wp, "data/nga_wp.rds")

1.7 Visualising the Spatial Distribution of Water Points

nga_wp <- read_rds("data/nga_wp.rds")
total <- qtm(nga_wp, "total wpt")
wp_functional <- qtm(nga_wp, "wpt functional")
wp_nonfunctional <- qtm(nga_wp, "wpt non-functional")
unknown <- qtm(nga_wp, "wpt unknown")

tmap_mode("view")
tmap_arrange(total, wp_functional, wp_nonfunctional, unknown, asp=1, ncol=2)

Based on the above chart, we observe that in terms of functional water-points the north-west zone has the most functional water-points the number of non-functional water-points seems to be scattered all over in Nigeria.

It is interesting to note that while the district Ifelodun has a relatively higher number of functional waterpoints, it also has the highest number of non-functional waterpoints.

In terms of unknown waterpoint statuses it it mostly populated in the north-central zone of Nigeria.

Summary Statistics of the data

First we will take a look the dataset.

head(nga_wp, n= 10)
Simple feature collection with 10 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 3.005022 ymin: 4.888055 xmax: 13.83477 ymax: 13.71406
Geodetic CRS:  WGS 84
        shapeName total wpt wpt functional wpt non-functional wpt unknown
1       Aba North        17              7                  9           1
2       Aba South        71             29                 35           7
3          Abadam         0              0                  0           0
4           Abaji        57             23                 34           0
5            Abak        48             23                 25           0
6       Abakaliki       233             82                 42         109
7  Abeokuta North        34             16                 15           3
8  Abeokuta South       119             72                 33          14
9             Abi       152             79                 62          11
10    Aboh-Mbaise        66             18                 26          22
                         geometry
1  MULTIPOLYGON (((7.401109 5....
2  MULTIPOLYGON (((7.334479 5....
3  MULTIPOLYGON (((13.83477 13...
4  MULTIPOLYGON (((7.045872 9....
5  MULTIPOLYGON (((7.811244 5....
6  MULTIPOLYGON (((8.4109 6.28...
7  MULTIPOLYGON (((3.143903 7....
8  MULTIPOLYGON (((3.301615 7....
9  MULTIPOLYGON (((8.153282 5....
10 MULTIPOLYGON (((7.321909 5....

1.8.1 Top 10 areas with the most functional waterpoints

top_func <- nga_wp %>%
  dplyr::top_n(10, `wpt functional`) %>%
  dplyr::select(shapeName, `wpt functional`)
top_func
Simple feature collection with 10 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 8.646937 ymin: 11.61986 xmax: 10.59487 ymax: 13.13825
Geodetic CRS:  WGS 84
        shapeName wpt functional                       geometry
1            Auyo            500 MULTIPOLYGON (((10.05964 12...
2          Babura            752 MULTIPOLYGON (((9.037085 12...
3        Biriniwa            645 MULTIPOLYGON (((10.2548 13....
4        Gagarawa            412 MULTIPOLYGON (((9.690529 12...
5         Kaugama            520 MULTIPOLYGON (((9.886559 12...
6    Kiri Kasamma            399 MULTIPOLYGON (((10.259 12.4...
7          Kiyawa            460 MULTIPOLYGON (((9.660932 11...
8           Nguru            405 MULTIPOLYGON (((10.33446 12...
9  Sule-Tankarkar            420 MULTIPOLYGON (((9.028389 12...
10          Taura            591 MULTIPOLYGON (((9.510234 12...
plot(nga_wp)

Limitations/ Further work

For future work to consider demarcate the different regions in Nigeria as outline below to understand better if certain region faced water shortage more severely than other regions.